System and method for synthetic-model-based benchmarking of ai hardware

ABSTRACT

Embodiments described herein provide a system for facilitating efficient benchmarking of a piece of hardware configured to process artificial intelligence (AI) related operations. During operation, the system determines the workloads of a set of AI models based on layer information associated with a respective layer of a respective AI model. The set of AI models are representative of applications that run on the piece of hardware. The system forms a set of workload clusters from the workloads and determines a representative workload for a workload cluster. The system then determines, using a meta-heuristic, an input size that corresponds to the representative workload. The system determines, based on the set of workload clusters, a synthetic AI model configured to generate a workload that represents statistical properties of the workloads on the piece of hardware. The input size can generate the representative workload at a computational layer of the synthetic AI model.

RELATED APPLICATION

The present disclosure is related to U.S. patent application Ser. No.16/051,078, Attorney Docket Number ALI-A15556US, titled “System andMethod for Benchmarking AI Hardware using Synthetic Model,” by inventorsWei Wei, Lingjie Xu, and Lingling Jin, filed 31 Jul. 2018, thedisclosure of which is incorporated by reference herein.

BACKGROUND Field

This disclosure is generally related to the field of artificialintelligence (AI). More specifically, this disclosure is related to asystem and method for generating a synthetic model that can benchmark AIhardware.

Related Art

The exponential growth of AI applications has made them a popular mediumfor mission-critical systems, such as a real-time self-driving vehicleor a critical financial transaction. Such applications have brought withthem an increasing demand for efficient AI processing. As a result,equipment vendors race to build larger and faster processors withversatile capabilities, such as graphics processing, to efficientlyprocess AI-related applications. However, a graphics processor may notaccommodate efficient processing of mission-critical data. The graphicsprocessor can be limited by processing limitations and designcomplexity, to name a few factors.

As more AI features are being implemented in a variety of systems (e.g.,automatic braking of a vehicle), AI processing capabilities are becomingprogressively more important as a value proposition for systemdesigners. Typically, extensive use of input devices (e.g., sensors,cameras, etc.) has led to generation of large quantities of data, whichis often referred to as “big data,” that a system uses. The system canuse large and complex models that can use AI models to infer decisionsfrom the big data. However, the efficiency of execution of large modelson big data depends on the computational capabilities, which may becomea bottleneck for the system. To address this issue, the system can useAI hardware (e.g., an AI accelerator) capable of efficiently processingan AI model.

Typically, tensors are often used to represent data associated with AIsystems, store internal representations of AI operations, and analyzeand train AI models. To efficiently process tensors, some vendors havedeveloped AI accelerators, such as tensor processing units (TPUs), whichare processing units designed for handling tensor-based AI computations.For example, TPUs can be used for running AI models and may provide highthroughput for low-precision mathematical operations.

While AI accelerators bring many desirable features to AI processing,some issues remain unsolved for benchmarking AI hardware for a varietyof applications.

SUMMARY

Embodiments described herein provide a system for facilitating efficientbenchmarking of a piece of hardware configured to process artificialintelligence (AI) related operations. During operation, the systemdetermines the workloads of a set of AI models based on layerinformation associated with a respective layer of a respective AI modelin the set of AI models. The set of AI models are representative ofapplications that run on the piece of hardware. The system forms a setof workload clusters from the determined workloads and determines arepresentative workload for a workload cluster of the set of workloadclusters. The system then determines, using a meta-heuristic, an inputsize that corresponds to the representative workload. Subsequently, thesystem determines, based on the set of workload clusters, a synthetic AImodel configured to generate a workload that represents statisticalproperties of the determined workloads on the piece of hardware. Theinput size can generate the representative workload at a computationallayer of the synthetic AI model.

In a variation on this embodiment, the computational layer of thesynthetic AI model corresponds to the workload cluster.

In a variation on this embodiment, the system combines the computationallayer with a set of computational layers to form the synthetic AI model.A respective computational layer can correspond to a workload cluster ofthe set of workload clusters.

In a variation on this embodiment, the system adds a rectified linearunit (ReLU) layer and a normalization layer to the computational layer.The computational layer can be a convolution layer.

In a variation on this embodiment, the system determines therepresentative workload based on a mean or a median of a respectiveworkload in the workload cluster.

In a variation on this embodiment, the system determines the input sizefrom an input size group representing individual input sizes of a set oflayers of the set of AI models.

In a further variation, the system determines the input size by settingthe representative workload as an objective of the meta-heuristic,setting the individual input sizes and corresponding frequencies assearch parameters of the meta-heuristic, and executing themeta-heuristic until reaching within a threshold of the objective.

In a further variation, the meta-heuristic is a genetic algorithm andthe objective is a fitness function.

In a further variation, a respective individual input size of theindividual input sizes includes number of filters, filter size, andfilter stride information of a corresponding layer of the set of layers.

In a variation on this embodiment, the system forms a set of input sizegroups based on the input sizes of the layers of the set of AI modelsand independently executes the meta-heuristic on a respective input sizegroup of the set of input size groups.

BRIEF DESCRIPTION OF THE FIGURES

FIG. 1A illustrates an exemplary environment that facilitates generationof a synthetic AI model for benchmarking AI hardware, in accordance withan embodiment of the present application.

FIG. 1B illustrates an exemplary benchmarking system that generates asynthetic AI model for benchmarking AI hardware, in accordance with anembodiment of the present application.

FIG. 2A illustrates an exemplary clustering of the workloads of thelayers of representative AI models based on respective workloads forgenerating a synthetic AI model, in accordance with an embodiment of thepresent application.

FIG. 2B illustrates an exemplary workload table for facilitating theclustering of the workloads, in accordance with an embodiment of thepresent application.

FIG. 2C illustrates an exemplary grouping of input sizes of the layersof representative AI models for generating a synthetic AI model, inaccordance with an embodiment of the present application.

FIG. 3A illustrates an exemplary matching of clusters and correspondinginput sizes, in accordance with an embodiment of the presentapplication.

FIG. 3B illustrates an exemplary process of generating input sizes tomatch corresponding representative workloads of respective clusters, inaccordance with an embodiment of the present application.

FIG. 4A illustrates an exemplary input-size determination for asynthetic AI model using a meta-heuristic, in accordance with anembodiment of the present application.

FIG. 4B illustrates an exemplary synthetic AI model representing a setof AI models corresponding to representative applications, in accordancewith an embodiment of the present application.

FIG. 5A presents a flowchart illustrating a method of a benchmarkingsystem collecting layer information of representative AI models, inaccordance with an embodiment of the present application.

FIG. 5B presents a flowchart illustrating a method of a benchmarkingsystem performing computation load analysis, in accordance with anembodiment of the present application.

FIG. 5C presents a flowchart illustrating a method of a benchmarkingsystem clustering the layers of representative AI models based onrespective workloads, in accordance with an embodiment of the presentapplication.

FIG. 5D presents a flowchart illustrating a method of a benchmarkingsystem grouping input sizes of the layers of representative AI models,in accordance with an embodiment of the present application.

FIG. 6A presents a flowchart illustrating a method of a benchmarkingsystem matching clusters and corresponding input sizes, in accordancewith an embodiment of the present application.

FIG. 6B presents a flowchart illustrating a method of a benchmarkingsystem determining a representative input size for a correspondingrepresentative workload based on a meta-heuristic, in accordance with anembodiment of the present application.

FIG. 6C presents a flowchart illustrating a method of a benchmarkingsystem generating a synthetic AI model representing a set of AI models,in accordance with an embodiment of the present application.

FIG. 6D presents a flowchart illustrating a method of a benchmarkingsystem benchmarking AI hardware using a synthetic AI model, inaccordance with an embodiment of the present application.

FIG. 7 illustrates an exemplary computer system that facilitates abenchmarking system for AI hardware, in accordance with an embodiment ofthe present application.

FIG. 8 illustrates an exemplary apparatus that facilitates abenchmarking system for AI hardware, in accordance with an embodiment ofthe present application.

In the figures, like reference numerals refer to the same figureelements.

DETAILED DESCRIPTION

The following description is presented to enable any person skilled inthe art to make and use the embodiments, and is provided in the contextof a particular application and its requirements. Various modificationsto the disclosed embodiments will be readily apparent to those skilledin the art, and the general principles defined herein may be applied toother embodiments and applications without departing from the spirit andscope of the present disclosure. Thus, the embodiments described hereinare not limited to the embodiments shown, but are to be accorded thewidest scope consistent with the principles and features disclosedherein.

Overview

The embodiments described herein solve the problem of efficientlybenchmarking AI hardware by generating a synthetic AI model thatrepresents the statistical characteristics of the workloads of a set ofAI models corresponding to representative applications and theirexecution frequencies. The AI hardware can be a piece of hardwarecapable of efficiently processing AI-related operations, such ascomputing a layer of a neural network. The representative applicationsare the various applications that AI hardware, such as an AIaccelerator, may run. Hence, the performance of the AI hardware istypically determined by benchmarking the AI hardware for the set of AImodels. Benchmarking refers to the act of running a computer program, aset of programs, or other operations, to assess the relative performanceof a software or hardware system. Benchmarking is typically performed byexecuting a number of standard tests and trials on the system.

An AI model can be any model that uses AI-based techniques (e.g., aneural network). An AI model can be a deep learning model thatrepresents the architecture of a deep learning representation. Forexample, a neural network can be based on a collection of connectedunits or nodes where each connection (e.g., a simplified version of asynapse) between artificial neurons can transmit a signal from one toanother. The artificial neuron that receives the signal can process itand then signal artificial neurons connected to it.

With existing technologies, the AI models (e.g., deep learningarchitectures) are typically derived from experimental designs. As aresult, these AI models have become more application-specific. Forexample, these AI models can have functions specific to their intendedgoals, such as correct image processing or natural language processing(NLP). In the field of image processing, an AI model may only classifyimages, or in the field of NLP, an AI model may only differentiatelinguistic expressions. This application-specific approach causes the AImodels to have their own architecture and structure. Even though AImodels can be application-specific, AI hardware is usually designed fora wide set of AI-based applications, which can be referred to asrepresentative applications that represent the most typical use of AI.

Hence, to test the performance of the AI hardware for this set ofapplications, the corresponding benchmarking process can requireexecution of the set of AI models, which can be referred to asrepresentative AI models, associated with the representativeapplications. However, running the representative AI models on the AIhardware and determining the respective performances may have a fewdrawbacks. For example, setting up (e.g., gathering inputs) andexecuting a respective one of the representative AI models can betime-consuming and labor-intensive. In addition, during the benchmarkingprocess, the relative significance for a respective AI model (e.g., therespective execution frequencies) may not be apparent and may not bereflected during testing.

To solve this problem, embodiments described herein facilitate abenchmarking system that can generate a synthetic AI model, or an SAImodel, (e.g., a synthetic neural network) that can efficiently evaluatethe AI hardware. The SAI model can represent the computational workloadsand execution frequencies of the representative AI models. This allowsthe system to benchmark the AI hardware by executing the SAI modelinstead of executing individual AI models on the AI hardware. Since theexecution of the SAI model can correspond to the workload of therepresentative AI models and their respective execution frequencies, thesystem can benchmark the AI hardware by executing the SAI model anddetermine the performance of the AI hardware for the representative AImodels.

During operation, the system can determine the representative AI modelsbased on the representative application. For example, if imageprocessing, natural language processing, and data generators are therepresentative applications, the system can obtain image classificationand regressions models, voice recognition models, and generative modelsas representative AI models. The system then collects informationassociated with a respective layer of a respective AI model. Collectedinformation can include one or more of: number of channels, number offilters, filter size, stride information, and padding information. Thesystem can also determine the execution frequencies of a respective AIapplication (e.g., how frequently an application runs over a period oftime). The system can use one or more framework interfaces, such as agraphics processing unit (GPU) application programming interfaces (API),to collect the information.

Based on the collected information and the execution frequencies, thesystem can determine the workload of a respective layer, and store theworkload information in a workload table. The system then can clusterworkloads of the layers (e.g., using k-means) based on the workloadtable. The system can determine a representative workload for arespective cluster. The system can also group the input sizes of thelayers. The system can determine a representative input size for arespective input group based on a meta-heuristic (e.g., a geneticalgorithm). Using the meta-heuristic, the system generates arepresentative input size of a input group such that the input size cangenerate a corresponding representative workload. The system cangenerate an SAI model that includes a layer corresponding each cluster.The system then executes the SAI model to benchmark the AI hardware.Since the SAI model incorporates the statistical characteristics of theworkload of all representative AI models, benchmarking using the SAImodel allows the system to determine the performance of allrepresentative AI models.

Exemplary System

FIG. 1A illustrates an exemplary environment that facilitates generationof an SAI model for benchmarking AI hardware, in accordance with anembodiment of the present application. A benchmarking environment 100can include a testing device 110 that includes AI hardware 108 and asynthesizing device 120. In this example, AI models 130 are the set ofrepresentative AI models corresponding to a set of representativeapplications. AI models 130 can include AI models 132, 134, and 136,forming the set of representative AI models. If image processing, NLP,and data generators are the representative applications, AI models 132,134, and 136 can be image classification and regressions model, voicerecognition model, and generative model, respectively.

Device 110 can be equipped with AI hardware 108, such as an AIaccelerator, that can efficiently process the computations associatedwith AI models 130. Device 110 can also include a system processor 102,a system memory device 104, and a storage device 106. Device 110 can beused for testing the performance of AI hardware 108 for one or more ofthe representative applications. To evaluate the performance of AIhardware 108, device 110 can execute a number of standard tests andtrials on AI hardware 108. For example, device 110 can execute AI models130 on AI hardware 108 to evaluate their performance.

With existing technologies, AI models 130 can be typically derived fromexperimental designs. As a result, AI models 130 have become moreapplication-specific. For example, each of AI models 130 can havefunctions specific to an intended goal. For example, AI model 132 can bestructured for image processing, and AI model 134 can be structured forNLP. As a result, AI model 132 may only classify images, and AI model134 may only differentiate linguistic expressions. Thisapplication-specific approach causes AI models 130 to have their ownarchitecture and structure. Even though AI models 130 can beapplication-specific, AI hardware 108 can be designed to efficientlyexecute any combination of individual models in AI models 130.

Hence, to test the performance of AI hardware 108, a respective one ofAI models 130 can be executed on AI hardware 108. However, running arespective one of AI models 130 on AI hardware 108 and determining therespective performances may have a few drawbacks. For example, settingup (e.g., gathering inputs) and executing a respective one of AI models130 can be time-consuming and labor-intensive. In addition, during thebenchmarking process, the relative significance for a respective AImodel may not be apparent and may not be reflected during testing. Forexample, AI model 134 can typically be executed more times than AI model136 over a period of time. As a result, the benchmarking process needsto accommodate the execution frequencies of AI models 130.

To solve this problem, a benchmarking system 150 can generate an SAImodel 140, which can be a synthetic neural network, that can efficientlyevaluate AI hardware 108. System 150 can operate on device 120, whichcan comprise a processor 112, a memory device 114, and a storage device116. SAI model 140 can represent the computational workloads andexecution frequencies of AI models 130. This allows system 150 tobenchmark AI hardware 108 by executing SAI model 140 instead ofexecuting individual models of AI models 130 on AI hardware 108. Sincethe execution of SAI model 140 can correspond to the workload of AImodels 130 and their respective execution frequencies, system 150 canbenchmark AI hardware 108 by executing SAI model 140 and determine theperformance of AI hardware 108 for AI models 130.

During operation, system 150 can determine AI models 130 based on therepresentative applications. In some embodiments, system 150 canmaintain a list of representative applications (e.g., in a local storagedevice) and their corresponding AI models. This list can be generatedduring the configuration of system 150 (e.g., by an administrator).Furthermore, AI models 130 can be loaded onto the memory of device 120such that system 150 may access a respective one of AI models 130. Thisallows system 150 to collect information associated with a respectivelayer of AI models 132, 134, and 136. Collected information can includeone or more of: number of channels, number of filters, filter size,stride information, and padding information.

System 150 can also determine the execution frequencies of a respectiveAI model in AI model 130. System 150 can use one or more techniques tocollect the information. Examples of collection techniques include, butare not limited to, GPU API calls, TensorFlow calls, Caffe2, and MXNet.Based on the collected information and the execution frequencies, system150 can determine the workload of a respective layer of a respective oneof AI models 130. System 150 may calculate the computation load of alayer based on corresponding input parameters and the algorithm appliedon it. System 150 can store the workload information in a workloadtable.

System 150 can cluster the workloads of the layers by applying aclustering technique to the workload table. For example, system 150 canuse a k-means-based clustering technique in such a way that the value ofk is configurable and may dictate the number of clusters. System 150 canalso group the input sizes of the layers. In some embodiments, thenumber of input groups also corresponds to the value of k. Under such ascenario, the number of clusters corresponds to the number of inputgroups. System 150 can determine a representative workload for arespective cluster. To do so, system 150 can calculate a mean or amedian of the workloads associated with the cluster (e.g., of theworkloads of the layers in the cluster). Similarly, system 150 can alsodetermine an estimated input size for a respective input group.

System 150 can establish an initial match between a cluster and acorresponding input group based on a match between the representativeworkload of that cluster with the estimated input size of the inputgroup. Based on the initial match, system 150 selects an input group fora cluster. System 150 then determines a representative input size of theselected input group such that the input size can generate therepresentative workload of the cluster. System 150 can use ameta-heuristic to generate the representative input size. Themeta-heuristic can set the representative workload as an objective anduse the input sizes of the input group as search parameters.

System 150 then generates SAI model 140 in such a way that a respectivelayer of SAI model 140 corresponds to a cluster and the input size forthat layer is the representative input size matched to that cluster.System 150 may send SAI model 140 and its corresponding inputs to device110 through file transfer (e.g., via a network 170, which can be a localor a wide area network). An instance of system 150 can operate on device110 and execute SAI model 140 on AI hardware 108 for benchmarking. SinceSAI model 140 incorporates the statistical characteristics of theworkload of AI models 130, benchmarking using SAI model 140 allowssystem 150 to determine the performance of all of AI models 130 on AIhardware 108.

FIG. 1B illustrates an exemplary benchmarking system that generates asynthetic AI model for benchmarking AI hardware, in accordance with anembodiment of the present application. During operation, system 150generates SAI model 140 that statistically matches the workload (i.e.,computation load) of AI models 130. SAI model 140 can represent thestatistical characteristics of the workload of each layer (e.g.,convolution, pooling, normalization, etc.) of a respective one of AImodels 130. Hence, evaluation results of SAI model 140 on AI hardware108 can produce a statistically representative benchmark of AI models130 running on AI hardware 108. This can improve the runtime of thebenchmarking process.

System 150 can include a collection unit 152, a computation loadanalysis unit 154, a clustering unit 156, a grouping unit 158, and asynthesis unit 160. Collection unit 152 collects the layer informationusing a monitoring system 151, which can deploy one or more collectiontechniques, such as issuing API calls, for collecting information.Monitoring system 151 can obtain a number of channels, number offilters, filter size, stride information, and padding informationassociated with a respective layer of a respective one of AI models 130.It should be noted that if the number of representative AI models islarge, monitoring system 151 may issue hundreds of thousands of APIcalls for different layers of the representative AI models.

Computation load analysis unit 154 then determines the computationalload or the workload from the collected information. To do so,computation load analysis unit 154 can classify the layers. For example,the classes can correspond to convolution layer, pooling layer, andnormalization layer. For each class, this computation load analysis unit154 can calculate the workload of a layer based on the input parametersand algorithms applicable to the layer. In some embodiments, theworkload of a layer can be calculated based on multiply-accumulate (MAC)time for the operations associated with the layer. Computation loadanalysis unit 154 then stores the computed workload in a workload tablein association with the layer (e.g., using a layer identifier).

Clustering unit 156 can cluster the workloads of the layers in such away that similar workloads are included in the same cluster. Clusteringunit 156 can use a clustering technique, such as k-means-basedclustering technique, to determine the clusters. In some embodiments,clustering unit 156 can use a predetermined or a configured value of k,which in turn, may dictate the number of clusters to be formed.Clustering unit 156 can determine the representative workload, or thecenter, for each cluster by calculating a mean or a median of theworkloads associated with that cluster. Similarly, grouping unit 158 cangroup the similar input sizes of the layers into input groups. Groupingunit 158 can also use a meta-heuristic to determine the representativeinput size of a respective input group.

Synthesis unit 160 then synthesizes SAI model 140 based on the number ofclusters. Typically, convolution is considered as the most importantlayer since the computational load of the convolution layers of an AImodel represents most of the workload of the AI model. Hence, synthesisunit 160 can form SAI model 140 by clustering the workloads of theconvolution layers. For example, if clustering unit 156 has formed nclusters of the workloads of the convolution layers, synthesis unit 160can rank the representative workloads of these n clusters. Synthesisunit 160 can map each cluster to a corresponding input group in such away that the representative input size of the input group can generatethe representative workload of the cluster. To do so, synthesis unit 160may adjust the input size of an input group. For example, synthesis unit160 can adjust the number of channels, filter size, and stride for eachlayer of SAI model 140 to ensure that the workload of the layercorresponds to the workload of the associated cluster.

Cluster and Group Formation

FIG. 2A illustrates an exemplary clustering of the workloads of thelayers of representative AI models based on respective workloads forgenerating a synthetic AI model, in accordance with an embodiment of thepresent application. To cluster the layers based on their respectiveworkloads, system 150 determines a class of layers of interest. In someembodiments, system 150 can select the convolution layers (denoted withdashed lines) for forming clusters since these layers are responsiblefor most of the computations of an AI model. In other words, if system150 generates an SAI model that represents the statistical properties ofthe workloads of the convolution layers of AI models 130, that SAI modelcan be representative of the workloads of AI models 130.

System 150 then computes the workload associated with a respective layerof a respective one of AI models 130. For example, for a layer 220 of AImodel 134, system 150 determines layer information 224, which caninclude number of filters, filter size, stride information, and paddinginformation. In some embodiments, system 150 uses layer information 224to determine the MAC operations associated with layer 220 and computeMAC time that indicates the time to execute the determined MACoperations. System 150 can use the computed MAC time as workload 222 forthat layer. Suppose that the execution frequency of AI model 134 is 3.System 150 can then calculate workload 222 three times, and considereach of them as a workload of an individual and separate layer.Alternatively, system 150 can store workload 222 in association with theexecution frequency of AI model 134. This allows system 150 toaccommodate execution frequencies of AI models 130.

System 150 can repeat this process for a respective selected layer of arespective one of AI models 130. In some embodiments, system 150 canstore the computed workloads in a workload table 240. System 150 thenparses workload table 240 to cluster the workloads into a set ofclusters 212, 214, and 216. System 150 can form a cluster using anyclustering technique. System 150 can determine the number of clustersbased on a clustering parameter. The parameter can be based on how theworkloads are distributed (e.g., based on a range of workloads that canbe included in a cluster or a diameter of a cluster) or a predeterminednumber of clusters. Based on the clustering parameter, in the example inFIG. 2A, clusters 212, 214, and 216 can include five, six, and eightworkloads, respectively.

System 150 then determines a representative workload for a respectivecluster. In the example in FIG. 2A, cluster 216 can include eightworkloads corresponding to different layers and their respectiveexecution frequencies. System 150 can calculate a representativeworkload 236 for cluster 216 by calculating the average (or the median)of the eight workloads in cluster 216. In the same way, system 150 cancalculate representative workload 232 for cluster 212 based on the fiveworkloads in cluster 212 and representative workload 234 for cluster 214based on the six workloads in cluster 214. Since the workloads in acluster also incorporate the execution frequencies, the representativeweight for a cluster can be closer to the workload of a layer with ahigh execution frequency. For example, since the execution frequency oflayer 242 is three and the execution frequency of layer 244 is one,representative workload 234 is closer to the workload of layer 242.

FIG. 2B illustrates an exemplary workload table for facilitating theclustering of the workloads, in accordance with an embodiment of thepresent application. Workload table 240 can include a respectiveworkload computed by system 150. Workload table 240 can map a respectiveworkload to a corresponding AI model identifier, a layer identifier ofthe layer corresponding to the workload, and an execution frequency ofthe AI model. Suppose that AI model 132 includes layers 246, 247, and248, which can be convolution layers. AI model 132 can be identified bya model identifier 250 and layers 246, 247, and 248 can be identified bylayer identifiers 252, 254, and 256, respectively. AI model 132 can havean execution frequency 260. In the example in FIG. 2A, the value ofexecution frequency 260 is 2.

During operation, system 150 computes workload 262 for layer 246. System150 can generate an entry in workload table for workload 262, which mapsworkload 262 to AI model identifier 250, layer identifier 252, andexecution frequency 260. This allows system 150 to compute workload 262once instead of the number of times specified by execution frequency260. When system 150 computes the representative workload, system 150can consider (workload 262*execution frequency 260) for the computation.In the same way, system 150 computes workloads 264 and 266 for layers247 and 248, respectively, of AI model 132. System 150 can storeworkloads 264 and 266 in workload table 240 in association with thecorresponding AI model identifier 250, layer identifiers 254 and 256,respectively, and execution frequency 260.

FIG. 2C illustrates an exemplary grouping of input sizes of the layersof representative AI models for generating a synthetic AI model, inaccordance with an embodiment of the present application. System 150 canobtain the input size of a respective layer of a respective one of AImodels 130. For example, for layer 220 of AI model 134, system 150determines input size 228, which can include number of filters, filtersize, stride information, and padding information. Similarly, system 150determines the input size of a respective selected layer (e.g., theconvolution layer) of a respective one of AI models 130. System 150 thengroups the input sizes into a set of input groups 272, 274, and 276.System 150 can form an input group using any grouping technique.

System 150 then determines a representative input size for a respectiveinput group. In the example in FIG. 2C, input group 276 can include twoinput sizes corresponding to different layers. Since layers 220 and 244can have the same input size 228, system 150 may consider input size 228once or twice in input group 276 depending on a calculation policy.System 150 can calculate a center input size 286 for input group 276 bycalculating the average (or the median) of the two (or three dependingon the calculation policy) input sizes in input group 276. In the sameway, system 150 can calculate center input size 282 for input group 272based on the two input sizes in input group 272 and center input size284 for input group 274 based on the three input sizes in input group274.

If the calculation policy indicates that each input size is consideredbased on its frequency (e.g., input size 228 is considered twice), arespective input group can include one or more subgroups, each of whichindicate a frequency of a particular input size. In this example, inputgroup 276 can include subgroups 275 and 277. Subgroup 275 can include aninput size with a frequency of one. On the other hand, subgroup 277 caninclude an input size with a frequency of two. In other words, subgroup277 can include input size 228 twice, which corresponds to the inputsize for layers 220 and 244.

Synthesis

System 150 uses clusters 212, 214, and 216 to generate the layers of SAImodel 140. System 150 further determines the input size for a respectivelayer corresponding to the representative workload of each of clusters212, 214, and 216. To do so, system 150 matches clusters 212, 214, and216 to input groups 272, 274, and 276. FIG. 3A illustrates an exemplarymatching of clusters and corresponding input sizes, in accordance withan embodiment of the present application. During operation, system 150determines, for each of representative workloads 232, 234, and 236, theinput size that can generate the representative workload for acorresponding layer.

To do so, system 150 can match center input sizes 282, 284, and 286,respectively, to representative workloads 232, 234, and 236. Forexample, system 150 can determine whether channel number, filter size,and stride in input size 282 generate a corresponding workload 232(i.e., generate the corresponding MAC time). If it is a match, system150 allocates input size 282 as the input to layer 312 of SAI model 140.In this way, system 150 builds SAI model 140, which comprises threelayers 312, 314, and 316 corresponding to clusters 212, 214, and 216,respectively. Layers 312, 314, and 316 can use center input sizes 282,284, and 286, respectively, as inputs. For each of these input sizes,channel number, filter size, and stride can generate the correspondingworkload.

However, input sizes 282, 284, and/or 286, used as inputs to layers ofan AI model, may not generate corresponding workloads 232, 234, and/or236, respectively. Under such circumstances, system 150 can use inputsizes 282, 284, and 286 to establish an initial match with workloads232, 234, and/or 236, respectively. This initial match indicates thatinput groups 272, 274, and 276 should be used to generate workloads 232,234, and/or 236, respectively. System 150 then uses the input sizes of arespective input group to generate a representative input size that canrepresent the corresponding workload.

FIG. 3B illustrates an exemplary process of generating input sizes tomatch corresponding representative workloads of respective clusters, inaccordance with an embodiment of the present application. For arespective input group, system 150 can apply a meta-heuristic 360 to theinput sizes in that input group and determine a representative inputsize for the input group. To determine a representative input size thatcan generate a representative workload, system 150 determines whichinput group corresponds to the cluster of the representative workloadbased on the initial match. In some embodiments, system 150 can maintaina table representing the initial match. This table can map a cluster(and its representative workload) to an input group. The mapping canalso include the subgroups of that input group and the frequency of arespective subgroup.

Suppose that cluster 212 (and its representative workload 232) is mappedto input group 272. To determine the input size that can generateworkload 232, system 150 can set workload 232 as the objective ofmeta-heuristic 360, and use a respective subgroup and a correspondingfrequency of input group 272 as search parameters to meta-heuristic 360.For a respective subgroup of input group 272, system 150 can considerchannel number, filter size, and filter stride as the input size formeta-heuristic 360. Similarly, system 150 can set workloads 234 and 236as the objective of meta-heuristic 360, and use a respective subgroupand a corresponding frequency of input groups 274 and 276, respectively,as search parameters to meta-heuristic 360. By running meta-heuristic360 independently on each of input groups 272, 274, and 276, system 150can generate corresponding input sizes 332, 334, and 336, respectively.In some embodiments, meta-heuristic 360 can be a genetic algorithm, andthe workload can be the fitness function of the genetic algorithms.

Input size 332 can generate workload 232 if used as an input to a layerof an AI model. Similarly, input sizes 334 and 336 can generateworkloads 234 and 236, respectively. In this way, system 150 determinesinput sizes 332, 334, and 336 for the layers of SAI model 140corresponding to clusters 212, 214, and 216, respectively. For example,system 150 determines channel number, filter size, and stride in inputsize 332 such that input size 332 can generate workload 232.Furthermore, system 150 also determines channel number, filter size, andstride in input sizes 334 and 336 for generating workloads 234 and 236,respectively. System 150 then builds SAI model 140, which comprisesthree layers 312, 314, and 316 corresponding to clusters 212, 214, and216, respectively.

FIG. 4A illustrates an exemplary input-size determination for asynthetic AI model using a meta-heuristic, in accordance with anembodiment of the present application. System 150 can maintain an inputgroup table 400 that maps an input group to its center input size. Foreach input group, table 400 can also include a respective input size inthe input group and the frequency of that input size. An input size andfrequency pair can represent a subgroup in the input group. Table 400maps input groups 272, 274, and 276 to center input sizes 282, 284, 286,respectively. For input group 272, table 400 further maps input sizes421 and 422 to their frequencies 411 and 412, respectively. Similarly,for input group 274, table 400 further maps input sizes 423 and 424 totheir frequencies 413 and 414, respectively; and for input group 276,table 400 further maps input sizes 425 and 426 to their frequencies 415and 416, respectively. As described in conjunction with FIG. 2C, inputsizes 425 and 426 correspond to subgroups 275 and 277, respectively, andfrequencies 415 and 416 can be 1 and 2, respectively. Similarly,frequencies 411, 412, 413, and 414 can be 1, 1, 1, and 2, respectively,indicating the frequencies of input sizes 421, 422, 423, and 424,respectively.

Based on the initial match, system 150 can determine whichrepresentative workload corresponds to which input group, as describedin conjunction with FIG. 3B. System 150 can then apply meta-heuristic360 to a respective input group in table 400 with the correspondingworkload as the objective. Here, system 150 individually appliesmeta-heuristic 360 to each input group in table 400 to determine arepresentative input size for that input group. In some embodiments,meta-heuristic 360 can be based on a genetic algorithm and the objectivecan be the fitness function. In table 400, system 150 can applymeta-heuristic 360 individually to each of input groups 272, 274, and276 with workloads 232, 234, and 236, respectively, as objectives. Inthis way, system 150 independently searches the inputs in each inputgroup (e.g., the filter size and stride, and the correspondingfrequency) using meta-heuristic 360. Based on the independent searching,system 150 determines input sizes 332, 334, and 336 for input groups272, 274, and 276, respectively.

Suppose that the center input size for an input group is 224×224, andthe input group includes 4 convolution operations grouped into 3subgroups with the 3 corresponding combinations of filter size andfilter stride. The total computation load can be 2156022912 for thatinput group. Since the number filters are usually under 1024, system 150can set length L=10 for each binary string for meta-heuristic 360. Thisindicates that there are 1 to 1024 possible solutions. As there are 4convolution operations in the input group, the total number of binarystring can be 4×L=40, generating 2⁴⁰ possible solutions. Since this is alarge solution space, system 150 can consider the initial generation of2000 individuals and run the genetic algorithm for 50 iterations.

FIG. 4B illustrates an exemplary synthetic AI model representing a setof AI models corresponding to representative applications, in accordancewith an embodiment of the present application. Upon determining inputsizes 332, 334, and 336, system 150 builds SAI model 140 with layers312, 314, and 316 corresponding to clusters 212, 214, and 216,respectively. System 150 determines layers 312, 314, and 316 in such away that these layers use input sizes 332, 334, and 336 to generateworkloads 232, 234, and 236, respectively. Since the convolution layersof AI models 130 represent most of the workloads, system 150 cangenerate layers 312, 314, and 316 as convolution layers.

For example, suppose that SAI model 140 generates a synthetic imagebased on an input image. Suppose that the input image size is 224×224×3.

The output image dimension can be calculated as (input image size—filtersize)/stride+1. Suppose that workload 232 is 36602000 (e.g., a MAC valueof 36602000). System 150 then determines channel number as 100, filtersize as 11×11, and stride as 4 for input size 332. This leads to anoutput image size of 55. This can generate a workload of approximately36602500, which is a close approximation of workload 232, for layer 312.In some embodiments, system 150 considers two values to be closeapproximations of each other if they are within a threshold value ofeach other.

In the same way, workload 234 can be 1351000. System 150 then determineschannel number as 80, filter size as 5×5, and stride as 2 for input size334. This leads to an output image size of 26. This can generate aworkload of approximately 1352000, which is a close approximation ofworkload 234, for layer 354. Similarly, workload 236 can be 228000.System 150 then determines channel number as 150, filter size as 3×3,and stride as 2 for input size 336. This leads to an output image sizeof 13. This can generate a workload of approximately 228150, which is aclose approximation of workload 236, for layer 356.

Furthermore, to ensure transition among layers 312, 314, and 316, system150 can incorporate a rectified linear unit (ReLU) layer and anormalization layer in a respective one of layers 312, 314, and 316. Asa result, a respective one of these layers includes convolution, ReLU,and normalization layers. For example, layer 354 can include convolutionlayer 452, ReLU layer 454, and normalization layer 456. System 150 thenappends a fully connected layer 402 and a softmax layer 404 to SAI model140. In this way, system 150 completes the construction of SAI model140.

System 150 then determines the performance of AI hardware 108 togenerate benchmark 450. Since workloads 232, 234, and 236 represent thestatistical properties of the selected layers of AI models 130,benchmarking AI hardware 108 using SAI model 140 can be considered assimilar to benchmarking AI hardware 108 using a respective one of AImodels 130 on AI hardware 108 at corresponding execution frequencies.Therefore, system 150 can efficiently generate benchmark 450 for AIhardware 108 by executing SAI model 140, thereby avoiding the drawbacksof benchmarking AI hardware 108 using a respective one of AI models 130.

Operations

FIG. 5A presents a flowchart 500 illustrating a method of a benchmarkingsystem collecting layer information of representative AI models, inaccordance with an embodiment of the present application. Duringoperation, the system identifies a representative AI applicationassociated with a representative application (operation 502). The systemcan interface with the AI model and collect information associated witha respective layer of the AI model (operation 504). The systemdetermines an execution frequency of the AI model based on thecorresponding execution frequency of the application (operation 506).The system then checks whether it has analyzed all representativeapplications (operation 508). If it hasn't analyzed all representativeapplications, the system continues to identify a representative AIapplication associated with the next representative application(operation 502). Upon analyzing all representative applications, thesystem stores the collected information in a local storage device(operation 510).

FIG. 5B presents a flowchart 530 illustrating a method of a benchmarkingsystem performing computation load analysis, in accordance with anembodiment of the present application. During operation, the systemclassifies a respective layer of a respective representative AI model(operation 532) and determines parameters (and algorithms) applicable toa layer based on the locally stored information (operation 534). Suchparameters can include number of filters, filter size, strideinformation, and padding information associated with the layer. Thesystem then calculates the workload for the layer based on theparameters (and algorithms) (operation 536)

The system can, optionally, repeat the calculation based on theexecution frequency of the AI model (operation 538). Alternatively, thesystem can store the workload in association with the executionfrequency of the AI model. The system then stores the calculatedworkload(s) in association with the layer identification information(and the execution frequency) in a workload table (operation 540). Thesystem checks whether it has analyzed all layers (operation 542). If ithasn't analyzed all layers, the system continues to determine parameters(and algorithms) applicable to the next layer based on the locallystored information (operation 534). Upon analyzing all layers, thesystem initiates the clustering process (operation 544).

FIG. 5C presents a flowchart 550 illustrating a method of a benchmarkingsystem clustering the layers of representative AI models based onrespective workloads, in accordance with an embodiment of the presentapplication. During operation, the system obtains the configurations forclustering the workloads (e.g., the value of k) (operation 552) andparses the workload table to obtain the workloads and correspondingexecution frequencies (operation 554). The system clusters the workloadsusing a clustering technique (e.g., using k-means-based clustering)based on the configurations (operation 556). The system then determinesthe representative workload for a respective cluster (operation 558).

FIG. 5D presents a flowchart 570 illustrating a method of a benchmarkingsystem grouping input sizes of the layers of representative AI models,in accordance with an embodiment of the present application. Duringoperation, the system determines the input size for a respective layer(operation 572). The system groups the input sizes into input groups(operation 574). In some embodiments, the number of input groups cancorrespond to the number of clusters. The system then determines therepresentative input size for a respective input group (operation 576).

FIG. 6A presents a flowchart 600 illustrating a method of a benchmarkingsystem matching clusters and corresponding input sizes, in accordancewith an embodiment of the present application. During operation, thesystem selects a class of layer (e.g., the convolution layer) forsynthesis and obtains the representative workload of a respectivecluster for the selected class (operation 602). The system obtains arespective input group for the selected class (operation 604). Thesystem then selects a cluster, it's representative workload, and acorresponding input group (operation 606). Subsequently, the systemdetermines an input size that can generate the representative workloadusing a meta-heuristic on the input group (operation 608). The systemchecks whether it has analyzed all clusters (operation 610). If thesystem hasn't analyzed all clusters, the system continues to selectanother cluster, it's representative workload, and a corresponding inputgroup (operation 606). Upon analyzing all clusters, the system initiatesthe synthesis process (operation 612).

FIG. 6B presents a flowchart 620 illustrating a method of a benchmarkingsystem determining a representative input size for a correspondingrepresentative workload based on a meta-heuristic, in accordance with anembodiment of the present application. During operation, the systemselects an input group and sets the corresponding representativeworkload as an objective of the meta-heuristic (e.g., a fitness functionfor a genetic algorithm) (operation 622). The system then sets thefilter size and filter stride, and the corresponding frequency of arespective subgroup in the input group as the search parameters for themeta-heuristic (operation 624). The system then executes themeta-heuristic to determine the representative input size that cangenerate the representative workload (e.g., the representative MAC)(operation 626). This execution can include executing the meta-heuristicuntil it reaches within a threshold (e.g., within 0.05%) of theobjective. It should be noted that the system independently executesthis process for a respective input group, as described in conjunctionwith FIG. 4A.

FIG. 6C presents a flowchart 630 illustrating a method of a benchmarkingsystem generating a synthetic AI model representing a set of AI models,in accordance with an embodiment of the present application. Duringoperation, the system determines a layer of the SAI model correspondingto a respective cluster (operation 632). This layer can correspond to aconvolution layer and the SAI model can be a synthetic neural network.The system can add additional layers, such as a ReLU layer and anormalization layer, to a respective layer of the SAI model (operation634). The system can add final layers, which can include a fullyconnected layer and a softmax layer, to complete the SAI model(operation 636).

FIG. 6D presents a flowchart 650 illustrating a method of a benchmarkingsystem benchmarking AI hardware using a synthetic AI model, inaccordance with an embodiment of the present application. Duringoperation, the system receives the SAI model on the testing devicecomprising the AI hardware to be evaluated (operation 652) andbenchmarks the AI hardware by executing the SAI model on the AI hardware(operation 654). The system then collects and stores benchmarkinformation associated with the AI hardware (operation 656).

Exemplary Computer System and Apparatus

FIG. 7 illustrates an exemplary computer system that facilitates abenchmarking system for AI hardware, in accordance with an embodiment ofthe present application. Computer system 700 includes a processor 702, amemory device 704, and a storage device 708. Memory device 704 caninclude a volatile memory device (e.g., a dual in-line memory module(DIMM)). Furthermore, computer system 700 can be coupled to a displaydevice 710, a keyboard 712, and a pointing device 714. Storage device708 can store an operating system 716, a benchmarking system 718, anddata 736. In some embodiments, computer system 700 can also include AIhardware 706 comprising one or more AI accelerators, as described inconjunction with FIG. 1A. Benchmarking system 718 can incorporate theoperations of system 150.

Benchmarking system 718 can include instructions, which when executed bycomputer system 700 can cause computer system 700 to perform methodsand/or processes described in this disclosure. Specifically,benchmarking system 718 can include instructions for collectinginformation associated with a respective layer of a one respective ofrepresentative AI models (collection module 720). Benchmarking system718 can also include instructions for calculating the workload (i.e.,the computational load) for a respective layer of a respective one ofrepresentative AI models (workload module 722). Furthermore,benchmarking system 718 includes instructions for clustering theworkloads and determining a representative workload for a respectivecluster (clustering module 724).

In addition, benchmarking system 718 includes instructions for groupinginput sizes of a respective layer of a respective one of representativeAI models into input groups (grouping module 726). Benchmarking system718 can further include instructions for determining a representativeinput size for a respective input group (grouping module 726).Benchmarking system 718 can also include instructions for generating aninput size corresponding to a respective representative workload basedon matching and/or a meta-heuristic, as described in conjunction withFIG. 3 (synthesis module 728). Benchmarking system 718 can includeinstructions for generating an SAI model based on the clusters and theinput sizes (synthesis module 728).

Benchmarking system 718 can also include instructions for benchmarkingAI hardware by executing the SAI model (performance module 730).Benchmarking system 718 may further include instructions for sending andreceiving messages (communication module 732). Data 736 can include anydata that can facilitate the operations of system 150. Data 736 mayinclude one or more of: layer information, a workload table, clusterinformation, and input group information.

FIG. 8 illustrates an exemplary apparatus that facilitates abenchmarking system for AI hardware, in accordance with an embodiment ofthe present application. Benchmarking apparatus 800 can comprise aplurality of units or apparatuses, which may communicate with oneanother via a wired, wireless, quantum light, or electricalcommunication channel. Apparatus 800 may be realized using one or moreintegrated circuits, and may include fewer or more units or apparatusesthan those shown in FIG. 8. Further, apparatus 800 may be integrated ina computer system, or realized as a separate device that is capable ofcommunicating with other computer systems and/or devices. Specifically,apparatus 800 can comprise units 802-814, which perform functions oroperations similar to modules 720-732 of computer system 700 of FIG. 7,including: a collection unit 802; a workload unit 804; a clustering unit806; a grouping unit 808; a synthesis unit 810; a performance unit 812;and a communication unit 814.

The data structures and code described in this detailed description aretypically stored on a computer-readable storage medium, which may be anydevice or medium that can store code and/or data for use by a computersystem. The computer-readable storage medium includes, but is notlimited to, volatile memory, non-volatile memory, magnetic and opticalstorage devices such as disks, magnetic tape, CDs (compact discs), DVDs(digital versatile discs or digital video discs), or other media capableof storing computer-readable media now known or later developed.

The methods and processes described in the detailed description sectioncan be embodied as code and/or data, which can be stored in acomputer-readable storage medium as described above. When a computersystem reads and executes the code and/or data stored on thecomputer-readable storage medium, the computer system performs themethods and processes embodied as data structures and code and storedwithin the computer-readable storage medium.

Furthermore, the methods and processes described above can be includedin hardware modules. For example, the hardware modules can include, butare not limited to, application-specific integrated circuit (ASIC)chips, field-programmable gate arrays (FPGAs), and otherprogrammable-logic devices now known or later developed. When thehardware modules are activated, the hardware modules perform the methodsand processes included within the hardware modules.

The foregoing embodiments described herein have been presented forpurposes of illustration and description only. They are not intended tobe exhaustive or to limit the embodiments described herein to the formsdisclosed. Accordingly, many modifications and variations will beapparent to practitioners skilled in the art. Additionally, the abovedisclosure is not intended to limit the embodiments described herein.The scope of the embodiments described herein is defined by the appendedclaims.

What is claimed is:
 1. A computer-implemented method, the methodcomprising: determining workloads of a set of artificial intelligence(AI) models based on layer information associated with a respectivelayer of a respective AI model in the set of AI models, wherein the setof AI models are representative of applications that run on a piece ofhardware configured to process AI-related operations; forming a set ofworkload clusters from the determined workloads; determining arepresentative workload for a workload cluster of the set of workloadclusters; determining, using a meta-heuristic, an input size thatcorresponds to the representative workload; and determining, based onthe set of workload clusters, a synthetic AI model configured togenerate a workload that represents statistical properties of thedetermined workloads on the piece of hardware, wherein the input sizegenerates the representative workload at a computational layer of thesynthetic AI model.
 2. The method of claim 1, wherein the computationallayer of the synthetic AI model corresponds to the workload cluster. 3.The method of claim 1, further comprising combining the computationallayer with a set of computational layers to form the synthetic AI model,wherein a respective computational layer corresponds to a workloadcluster of the set of workload clusters.
 4. The method of claim 1,further comprising adding a rectified linear unit (ReLU) layer and anormalization layer to the computational layer, wherein thecomputational layer is a convolution layer.
 5. The method of claim 1,further comprising determining the representative workload based on amean or a median of a respective workload in the workload cluster. 6.The method of claim 1, further comprising determining the input sizefrom an input size group representing individual input sizes of a set oflayers of the set of AI models.
 7. The method of claim 6, whereindetermining the input size further comprises: setting the representativeworkload as an objective of the meta-heuristic; setting the individualinput sizes and corresponding frequencies as search parameters of themeta-heuristic; and executing the meta-heuristic until reaching within athreshold of the objective.
 8. The method of claim 7, wherein themeta-heuristic is a genetic algorithm and the objective comprises afitness function of the genetic algorithm.
 9. The method of claim 6,wherein a respective individual input size of the individual input sizesincludes number of filters, filter size, and filter stride informationof a corresponding layer of the set of layers.
 10. The method of claim1, further comprising: forming a set of input size groups based on inputsizes of layers of the set of AI models; and independently executing themeta-heuristic on a respective input size group of the set of input sizegroups.
 11. A non-transitory computer-readable storage medium storinginstructions that when executed by a computer cause the computer toperform a method, the method comprising: determining workloads of a setof artificial intelligence (AI) models based on layer informationassociated with a respective layer of a respective AI model in the setof AI models, wherein the set of AI models are representative ofapplications that run on a piece of hardware configured to processAI-related operations; forming a set of workload clusters from thedetermined workloads; determining a representative workload for aworkload cluster of the set of workload clusters; determining, using ameta-heuristic, an input size that corresponds to the representativeworkload; and determining, based on the set of workload clusters, asynthetic AI model configured to generate a workload that representsstatistical properties of the determined workloads on the piece ofhardware, wherein the input size generates the representative workloadat a computational layer of the synthetic AI model.
 12. Thenon-transitory computer-readable storage medium of claim 11, wherein thecomputational layer of the synthetic AI model corresponds to theworkload cluster.
 13. The non-transitory computer-readable storagemedium of claim 11, wherein the method further comprises combining thecomputational layer with a set of computational layers to form thesynthetic AI model, wherein a respective computational layer correspondsto a workload cluster of the set of workload clusters.
 14. Thenon-transitory computer-readable storage medium of claim 11, wherein themethod further comprises adding a rectified linear unit (ReLU) layer anda normalization layer to the computational layer, wherein thecomputational layer is a convolution layer.
 15. The non-transitorycomputer-readable storage medium of claim 11, wherein the method furthercomprises determining the representative workload based on a mean or amedian of a respective workload in the workload cluster.
 16. Thenon-transitory computer-readable storage medium of claim 11, wherein themethod further comprises determining the input size from an input sizegroup representing individual input sizes of a set of layers of the setof AI models.
 17. The non-transitory computer-readable storage medium ofclaim 16, wherein determining the input size further comprises: settingthe representative workload as an objective of the meta-heuristic;setting the individual input sizes and corresponding frequencies assearch parameters of the meta-heuristic; and executing themeta-heuristic until reaching within a threshold of the objective. 18.The non-transitory computer-readable storage medium of claim 17, whereinthe meta-heuristic is a genetic algorithm and the objective comprises afitness function of the genetic algorithm.
 19. The non-transitorycomputer-readable storage medium of claim 16, wherein a respectiveindividual input size of the individual input sizes includes number offilters, filter size, and filter stride information of a correspondinglayer of the set of layers.
 20. The non-transitory computer-readablestorage medium of claim 11, wherein the method further comprises:forming a set of input size groups based on input sizes of layers of theset of AI models; and independently executing the meta-heuristic on arespective input size group of the set of input size groups.